AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Neural Information Processing SystemsSep-30-2025, 10:14:03 GMT

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

randomized kaczmarz algorithm, stochastic gradient descent, weighted sampling, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Neural Information Processing SystemsJan-18-2025, 11:46:18 GMT

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

We improve a recent gurantee of Bach and Moulines on the linear convergence of SGD for smooth and strongly convex objectives, reducing a quadratic dependence on the strong convexity to a linear dependence. Furthermore, we show how reweighting the sampling distribution (i.e. Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods.

artificial intelligence, machine learning, randomized kaczmarz algorithm, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Bou-Rabee, Nawaf, Eberle, Andreas, Oberdörster, Stefan

Ballistic Convergence in Hit-and-Run Monte Carlo and a Coordinate-free Randomized Kaczmarz Algorithm

arXiv.org Machine LearningDec-10-2024

Hit-and-Run is a coordinate-free Gibbs sampler, yet the quantitative advantages of its coordinate-free property remain largely unexplored beyond empirical studies. In this paper, we prove sharp estimates for the Wasserstein contraction of Hit-and-Run in Gaussian target measures via coupling methods and conclude mixing time bounds. Our results uncover ballistic and superdiffusive convergence rates in certain settings. Furthermore, we extend these insights to a coordinate-free variant of the randomized Kaczmarz algorithm, an iterative method for linear systems, and demonstrate analogous convergence rates. These findings offer new insights into the advantages and limitations of coordinate-free methods for both sampling and optimization.

artificial intelligence, hit-and-run, machine learning, (15 more...)

2412.07643

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Alderman, Seth J., Luikart, Roan W., Marshall, Nicholas F.

Randomized Kaczmarz with geometrically smoothed momentum

arXiv.org Machine LearningJan-17-2024

This paper studies the effect of adding geometrically smoothed momentum to the randomized Kaczmarz algorithm, which is an instance of stochastic gradient descent on a linear least squares loss function. We prove a result about the expected error in the direction of singular vectors of the matrix defining the least squares loss. We present several numerical examples illustrating the utility of our result and pose several questions.

corollary 1, singular vector, theorem 1, (11 more...)

2401.09415

Country:

North America > United States > Oregon > Benton County > Corvallis (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Marshall, Nicholas F., Mickelin, Oscar

An optimal scheduled learning rate for a randomized Kaczmarz algorithm

arXiv.org Artificial IntelligenceAug-9-2022

We study how the learning rate affects the performance of a relaxed randomized Kaczmarz algorithm for solving $A x \approx b + \varepsilon$, where $A x =b$ is a consistent linear system and $\varepsilon$ has independent mean zero random entries. We derive a learning rate schedule which optimizes a bound on the expected error that is sharp in certain cases; in contrast to the exponential convergence of the standard randomized Kaczmarz algorithm, our optimized bound involves the reciprocal of the Lambert-$W$ function of an exponential.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2202.12224

Country:

North America > United States > Virginia (0.04)
North America > United States > Oregon (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningSep-1-2020

On the Regularization Effect of Stochastic Gradient Descent applied to Least Squares

Steinerberger, Stefan

We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k - u$ than the remaining terms: if $x_k - x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.

artificial intelligence, machine learning, singular vector, (12 more...)

2007.13288

Country:

North America > United States > New York (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Needell, Deanna, Ward, Rachel, Srebro, Nati

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

Neural Information Processing SystemsFeb-14-2020, 07:12:17 GMT

randomized kaczmarz algorithm, stochastic gradient descent, weighted sampling, (3 more...)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Lei, Yunwen, Zhou, Ding-Xuan

Convergence of Online Mirror Descent Algorithms

arXiv.org Machine LearningFeb-18-2018

In this paper we consider online mirror descent (OMD) algorithms, a class of scalable online learning algorithms exploiting data geometric structures through mirror maps. Necessary and sufficient conditions are presented in terms of the step size sequence $\{\eta_t\}_{t}$ for the convergence of an OMD algorithm with respect to the expected Bregman distance induced by the mirror map. The condition is $\lim_{t\to\infty}\eta_t=0, \sum_{t=1}^{\infty}\eta_t=\infty$ in the case of positive variances. It is reduced to $\sum_{t=1}^{\infty}\eta_t=\infty$ in the case of zero variances for which the linear convergence may be achieved by taking a constant step size sequence. A sufficient condition on the almost sure convergence is also given. We establish tight error bounds under mild conditions on the mirror map, the loss function, and the regularizer. Our results are achieved by some novel analysis on the one-step progress of the OMD algorithm using smoothness and strong convexity of the mirror map and the loss function.

algorithm, artificial intelligence, machine learning, (16 more...)

1802.06357

Country:

Asia > China > Hong Kong > Kowloon (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Industry: Education > Educational Setting > Online (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Needell, Deanna, Srebro, Nathan, Ward, Rachel

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

arXiv.org Machine LearningJan-16-2015

We obtain an improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives, improving from a quadratic dependence on the conditioning $(L/\mu)^2$ (where $L$ is a bound on the smoothness and $\mu$ on the strong convexity) to a linear dependence on $L/\mu$. Furthermore, we show how reweighting the sampling distribution (i.e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence in the average smoothness, dominating previous results. We also discuss importance sampling for SGD more broadly and show how it can improve convergence also in other scenarios. Our results are based on a connection we make between SGD and the randomized Kaczmarz algorithm, which allows us to transfer ideas between the separate bodies of literature studying each of the two methods. In particular, we recast the randomized Kaczmarz algorithm as an instance of SGD, and apply our results to prove its exponential convergence, but to the solution of a weighted least squares problem rather than the original least squares problem. We then present a modified Kaczmarz algorithm with partially biased sampling which does converge to the original least squares solution with the same exponential convergence rate.

artificial intelligence, dependence, machine learning, (18 more...)

1310.5715

Country: North America > United States (0.67)

Genre: Research Report (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)